An Improved Naive Bayes Text Classification Algorithm In Chinese Information Processing
نویسنده
چکیده
In Chinese information processing, Naive Bayes is a simple text classification method that is easily implemented. Its core is the realization of the calculating posterior probability algorithm and the effectively reducing dimension for feature words. This paper improved Naive Bayes text classification from the calculating posterior probability and the reducing dimension of feature words of text. The result of experiment indicated that the improved method is of the higher efficiency than the original algorithm.
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملReview Paper on Sentiment Analysis of Twitter Data Using Text Mining and Hybrid Classification Approach
In Sentiment analysis we use natural language processing and information to extracting writer’s comments or reviews. In this paper we use Data text mining and hybrid approach of KNN Algorithm and Naïve Bayes Algorithm to find the sentiments of Indian people on Tweeter.
متن کاملIwona Żak * Marcin Ciura Automatic Text Categorisation
The paper presents a module for classifying Polish text, intended for use in an automatic processing of job advertisements. Two classifying algorithms are implemented: a naive Bayes classifier and TFIDF algorithm. Stop lists and stemming are used to improve the processing efficiency.
متن کاملImplementation and Evaluation of Scalable Approaches for Automatic Chinese Text Categorization
The purpose of this research is to identify scalable approaches that can handle large amount of training data such as several years of news articles, and automatically assign predefined category to Chinese free text documents. Our approach consists of the following processes: (i) term extraction, (ii) term selection, and (iii) document classification. The approach first builds a recently develo...
متن کاملThe study on the spam filtering technology based on Bayesian algorithm
This paper analyzed spam filtering technology, carried out a detailed study of Naive Bayes algorithm, and proposed the improved Naive Bayesian mail filtering technology. Improvement can be seen in text selection as well as feature extraction. The general Bayesian text classification algorithm mostly takes information gain and cross-entropy algorithm in feature selection. Through the principle o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010